Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[output] Adding new output for PagerDuty Incidents #467

Merged
merged 10 commits into from
Nov 16, 2017

Conversation

javuto
Copy link
Contributor

@javuto javuto commented Nov 10, 2017

to: @ryandeivert @jacknagz
cc: @airbnb/streamalert-maintainers
size: medium

Background

Migrating to the PagerDuty REST API helps to simplify the code and utilize features only present in v2. You can read up on more advantages of the migration to the new API here.
After adding support for the PagerDuty Events API v2, next step is to be able to create Incidents using the API.

Changes

  • Adding support for PagerDuty Incidents API v2 (REST API) as output. API reference for creating incidents..
  • Added tests for new output.
  • Adding context field as decorator in the alert, as RuleAttributes field. This allows specific rules to add some context to the alert record to be used in places like the outputs.
  • Added documentation for new field for rule context.

Testing

$ ./tests/scripts/unit_tests.sh
...
TOTAL                                            2388     81    97%
----------------------------------------------------------------------
Ran 425 tests in 8.940s

OK

$ ./tests/scripts/pylint.sh

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

$ python manage.py lambda test --processor rule all
...
StreamAlertCLI [INFO]: (59/59) Successful Tests
StreamAlertCLI [INFO]: (96/96) Alert Tests Passed
StreamAlertCLI [INFO]: Completed

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 95.445% when pulling 6c22135 on javier-streamalert-pagerduty-incidents into 6ae211b on master.

@javuto javuto force-pushed the javier-streamalert-pagerduty-incidents branch from 6c22135 to 0842ddf Compare November 10, 2017 18:25
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 95.445% when pulling 0842ddf on javier-streamalert-pagerduty-incidents into e75d559 on master.

Copy link
Contributor

@jacknagz jacknagz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome job, first round of comments!

context
~~~~~~~~~~~

``context`` is an optional argument which defines an extra field of information to pass on inside
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably be simplified to something like: context is an optional field to pass extra instructions to the alert processor on how to route the alert.

@@ -13,6 +13,7 @@
See the License for the specific language governing permissions and
limitations under the License.
"""
# pylint: disable=too-many-lines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah we need to start splitting these out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ @jacknagz I told him to do this for now. I also told @armtash to do it for his JIRA output stuff. I will break this out after these merge

Returns:
dict: Contains various default items for this output (ie: url)
"""
return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this be a single line?

def dispatch(self, **kwargs):
"""Send incident to Pagerduty Incidents API v2

Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just declare Keyword Args instead of Args

return self._log_status(False)

# Extracting context data to assign the incident
rule_context = kwargs['alert']['context'][self.__service__]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if context isn't in the alert? does it always need to be defined for this output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or rather, what if it's an empty map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to jack's point, the default for this will be an empty dict, given this change: https://github.com/airbnb/streamalert/pull/467/files#diff-30cc622fccfff0217d84af2ff5bcd7ffR74

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even it is defaulted to an empty dict, I will add a test for that because the output should just take the default escalation policy if no context is passed.

policy_to_assign = creds['escalation_policy']

policies_url = os.path.join(creds['api'], self.POLICIES_ENDPOINT)
policy_id = self._check_exists_get_id(policy_to_assign,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use consistent line breaks

'details': ''
}
# We need to get the service id from the API
services_url = os.path.join(creds['api'], self.SERVICES_ENDPOINT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two lines seem to be repeated in this function, can this be extracted into a helper?

@@ -268,6 +268,144 @@ def test_dispatch_bad_descriptor(self, log_error_mock):

log_error_mock.assert_called_with('Failed to send alert to %s', self.__service)

class TestPagerDutyIncidentOutput(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case for an empty context arg

# If there are results, get the first occurence from the list
return response and response.get(target_key)[0]['id']

def dispatch(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an expected schema for the context either here or in the class docstring

Copy link
Contributor

@ryandeivert ryandeivert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first round of comments

@@ -13,6 +13,7 @@
See the License for the specific language governing permissions and
limitations under the License.
"""
# pylint: disable=too-many-lines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ @jacknagz I told him to do this for now. I also told @armtash to do it for his JIRA output stuff. I will break this out after these merge

return self._log_status(False)

# Extracting context data to assign the incident
rule_context = kwargs['alert']['context'][self.__service__]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to jack's point, the default for this will be an empty dict, given this change: https://github.com/airbnb/streamalert/pull/467/files#diff-30cc622fccfff0217d84af2ff5bcd7ffR74

}]
# If the user retrieval did not succeed, default to policies
else:
user_to_assign = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this else and add the default value to the get(..) above:

user_to_assign = rule_context.get('assigned_user', False)


# Start preparing the incident JSON blob to be sent to the API
incident_title = 'StreamAlert Incident - Rule triggered: {}'.format(kwargs['rule_name'])
incident_body = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the type and details keys within this required? It looks like these values never get set, and this is just passed as the `body in the incident payload

If these are required, can you add a comment saying so and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are required, I just forgot about them :)

'id': service_id,
'type': 'service_reference'
}
incident_priority = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be a fan of just moving this dict and the above dict right into the structure below for the incident.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was digging into the API docs and it turns out that this is just optional, so maybe something else to provide via context.

if not user_to_assign and rule_context.get('assigned_policy'):
policy_to_assign = rule_context.get('assigned_policy')
else:
policy_to_assign = creds['escalation_policy']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this above the else so it's set by default and then just set it, like so:

policy_to_assign = creds['escalation_policy']
if not user_to_assign and rule_context.get('assigned_policy'):
    policy_to_assign = rule_context.get('assigned_policy')

one less line :)

# If there are results, get the first occurence from the list
return response and response.get(target_key)[0]['id']

def dispatch(self, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dispatch function is huge - can you move some of it to functions to make it more readable and make testing easier

@ryandeivert ryandeivert added this to the 1.6.0 milestone Nov 11, 2017
@javuto javuto force-pushed the javier-streamalert-pagerduty-incidents branch from 0842ddf to f2ca160 Compare November 13, 2017 23:18
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 95.283% when pulling f2ca160 on javier-streamalert-pagerduty-incidents into b1966aa on master.

index (int): test_index value (0 by default)
context(dict): context dictionary (empty by default)
"""
if not context:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified to context = context or {}

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 95.404% when pulling 33c0f77 on javier-streamalert-pagerduty-incidents into b1966aa on master.

@javuto javuto force-pushed the javier-streamalert-pagerduty-incidents branch from 33c0f77 to eb17826 Compare November 14, 2017 19:01
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 95.455% when pulling eb17826 on javier-streamalert-pagerduty-incidents into 71a51cf on master.

@javuto javuto force-pushed the javier-streamalert-pagerduty-incidents branch from eb17826 to ac63b6b Compare November 15, 2017 16:54
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 95.455% when pulling ac63b6b on javier-streamalert-pagerduty-incidents into bd8b80f on master.

Copy link
Contributor

@ryandeivert ryandeivert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments - this is looking great, thanks for addressing everything so far :)

return self._log_status(False)

# Preparing headers for API calls
headers = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: just set self.headers right here (or self._headers once the instance properties are made protected)


def __init__(self, *args, **kwargs):
StreamOutputBase.__init__(self, *args, **kwargs)
self.base_url = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make these instance properties protected? ie - self._headers, etc.. since they should never be accessed from outside an instance of this class.


# If there are results, get the first occurence from the list
target = response.get(target_key, False)
if response and target:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: returning early if there is ever the opportunity to do so, instead of having to do compound if checks throughout a function helps with decrease confusion and increase readability. For instance, this could be make more readable via:

response = resp.json()
if not response:
    return False           # Maybe even add a LOGGER.error(...) here (?)

# If there are results, get the first occurence from the list
target = response.get(target_key, False)
if not target:
    return False           # Maybe add a LOGGER.error(...) here too (?)

return target[0]['id']

Theoretically the last if/return can be consolidated into just:
return response['target_key'][0]['id'] if 'target_key' in response else False

Just food for thought :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like replacing the last if/else with that super pythonic statement

"""Method to verify the existance of a service with the API

Args:
api_url (str): Base URL of the API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this args list needs updated to only include service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javuto still seeing api_url in this list but this isn't an arg in the function declaration 🤔

# Extracting context data to assign the incident
rule_context = kwargs['alert'].get('context', {})
if rule_context:
rule_context = rule_context[self.__service__]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance that this service won't exist within the rule_context dict? It looks like this is performed without a get so is it safe to assume there will always be a value in it for this service? (I don't think this is the case so just making sure I'm not missing something?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presumed that the service will be there, but it is a good idea to make sure and add some checks. Nice catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the context that is sent from the rule processor, correct? Cause I'm thinking that will only rarely get set in some rules, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be the approach. The context element will be there, even if it is just an empty dict but the __service__ could be missing. Better do the check.

@javuto javuto force-pushed the javier-streamalert-pagerduty-incidents branch from ac63b6b to b7c9903 Compare November 15, 2017 21:50
@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 95.755% when pulling b7c9903 on javier-streamalert-pagerduty-incidents into 0adf349 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 95.746% when pulling c70cf67 on javier-streamalert-pagerduty-incidents into 0adf349 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 95.746% when pulling 005524e on javier-streamalert-pagerduty-incidents into 0adf349 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 95.746% when pulling 51ac380 on javier-streamalert-pagerduty-incidents into 0adf349 on master.

@javuto javuto force-pushed the javier-streamalert-pagerduty-incidents branch from 51ac380 to f620ec6 Compare November 16, 2017 00:40
@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 95.746% when pulling f620ec6 on javier-streamalert-pagerduty-incidents into 4e4e9a8 on master.

Copy link
Contributor

@ryandeivert ryandeivert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address last comment, then LGTM

@@ -392,7 +390,7 @@ def dispatch(self, **kwargs):
# Extracting context data to assign the incident
rule_context = kwargs['alert'].get('context', {})
if rule_context:
rule_context = rule_context[self.__service__]
rule_context = rule_context.get(self.__service__, {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we break out the logic below into a function that returns a tuple of the user_to_assign and the assigned_value? just trying to understand this flow better

@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 95.741% when pulling 84dc0ca on javier-streamalert-pagerduty-incidents into 4e4e9a8 on master.

return 'assignments', [{'assignee': user_assignee}]

# If escalation policy was not provided, use default one
policy_to_assign = context.get('assigned_policy') or self._escalation_policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a lookup like this, you can use a default value with the get like context.get('assigned_policy', self._escalation_policy) instead of doing the ... or ...

@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 95.741% when pulling 8cb7b02 on javier-streamalert-pagerduty-incidents into 4e4e9a8 on master.

@javuto javuto force-pushed the javier-streamalert-pagerduty-incidents branch from 8cb7b02 to 056c983 Compare November 16, 2017 18:26
@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 95.745% when pulling 056c983 on javier-streamalert-pagerduty-incidents into bc3391c on master.

@javuto javuto merged commit c1b293a into master Nov 16, 2017
@javuto javuto deleted the javier-streamalert-pagerduty-incidents branch November 16, 2017 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants